Measuring Similarity between XML Documents

نویسنده

  • Christopher C. Yang
چکیده

With the advance of World Wide Web standards, XML documents become popular in e-business applications for information exchange. Electronic catalogs and transaction records are now formatted in XML. XML documents are semi-structured documents with XML schemas marking up the semantics. XML separates presentation from semantics so that presentation of information on different devices can be processed independently from information management. However, information retrieval techniques for XML documents are still in the beginning stage. Most ebusiness applications only use XML for data interchange or presentation purposes. Advance information retrieval techniques have not been applied in retrieving, organizing, and managing XML documents. In this paper, we propose a similarity measurement between XML documents based on the Jaccard’s similarity function for unstructured text documents. Examples will be provided to illustrate the formulation. Given the proposed similarity measurement, traditional information retrieval techniques for unstructured test documents can also be applied for XML documents. The performance of XML document management will be significantly improved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

A Novel Approach to Measuring Structural Similarity between XML Documents

Measuring structural similarity between XML documents has become a key component in various applications, including XML mining, schema matching, and web service discovery, among others. This paper presents a novel structural similarity measure incorporating kernel methods into XML documents. Results on preliminary simulations show that this approach outperforms conventional ones.

متن کامل

Structural Similarity Evaluation Between XML Documents and DTDs

The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents and XML grammars, useful in various applications such as documents classifi...

متن کامل

A New Sequential Mining Approach to XML Document Similarity Computation

1 Manuscript submitted to Postgraduate Research Day 2 Corresponding author Abstract Measuring the structural similarity among XML documents is the task of finding their semantic correspondence and is fundamental to many web-based applications. While there exist several methods to address the problem, the data mining approach seems to be a novel, interesting and promising one. It works on the id...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004